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Method for Generating and Processing Transition Streams 

CROSS REFERENCE TO RELATED APPLICATIONS 

This application is a continuation-in-part of U.S. patent application serial number 
09/347,213, filed July 2, 1999 for FRAME-ACCURATE SEAMLESS SPLICING OF 
INFORMATION STREAMS (attorney docket number 13235) which is incorporated herein 
by reference in its entirety. This application claims the benefit of U.S. provisional patent 
application serial number 60/129,275, filed April 14, 1999 and incorporated herein by 
reference in its entirety. 

The invention relates to communications systems generally and, more particularly, 
the invention relates to a method for splicing or concatenating information streams in a 
substantially seamless manner. 

BACKGROUND OF THE DISCLOSURE 

In several communications systems the data to be transmitted is compressed so that 
the available bandwidth is used more efficiently. For example, the Moving Pictures Experts 
Group (MPEG) has promulgated several standards relating to digital data delivery systems. 
The first, known as MPEG-1 refers to ISO/IEC standards 1 1 172 and is incorporated herein 
by reference. The second, known as MPEG-2, refers to ISO/IEC standards 13818 and is 
incorporated herein by reference. A compressed digital video system is described in the 
Advanced Television Systems Committee (ATSC) digital television standard document 
A/53, and is incorporated herein by reference. 

It is important to television studios and other "consumers" of information streams to 
be able to concatenate or splice between information streams (e.g., transport encoded 
program streams incorporating video, audio and other associated information sub-streams) 
in a substantially seamless and frame accurate manner. "Frame accurate" means that a 
splice occurs precisely at the frames selected by the user, regardless of the frame type of the 
encoded frame (e.g., I-, P- or B-frame encoding). "Seamless splice" means a splice which 
results in a continuous, valid MPEG stream. Thus, a frame accurate seamless splicer will 
preserve an exact number of frames when performing a frame accurate seamless splice of a 
first information stream into a second information stream (e.g., a transport encoded program 
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comprising a 900 video frame commercial presentation may be scheduled into a "slot" of 
exactly 900 frames). 

Several known methods utilize variations of the following procedure: decoding an 
"in stream" and an "out stream" to a baseband or elementary level, performing a splice 
operation and re-encoding the resulting spliced stream. These methods provide frame 
accurate seamless splices, but at great expense. 

In an improved method allowing seamless splicing at the transport stream level. 
MPEG and MPEG-like information streams including, e.g., video information may be 
spliced together in a relatively seamless manner by defining "in-points" and "out-points- for 
each stream that are indicative of, respectively, appropriate stream entry and exit points. 
For example, a packet containing a video sequence header in an MPEG-like video stream 
comprises an appropriate in-point. An MPEG-like information stream that contains such in- 
points and out-points is said to be spliceable. The Society of Motion Picture and Television 
Engineers (SMPTE) has proposed a standard SMPTE 312M defining such splicing points 
entitled "Splice Points for MPEG-2 Transport Streams." which is incorporated herein by 
reference in its entirety. 

Unfortunately, the placement of such In points and out-points is defined by factors 
such as image frame encoding mode, group of pictures (GOP) structure and the like. 
Therefore, an end user trying to seamlessly splice between information streams cannot do 
so in a "frame accurate" manner if the desired splicing points are not appropriate in-pomts 
or out-points. 

Therefore, it is seen to be desirable to provide a method and apparatus that allows 
seamless, frame accurate splicing of MPEG-like transport streams. Moreover, it is seen to 
be desirable to provide a method and apparatus for applying such a seamless, frame 
accurate splicing method and apparatus to the particular environment of a television studio 
or other video serving environment. 

*T TMM ARY OF THF INVENTION 

The invention comprises a method for generating a transition stream and processing 
video, audio or other data within the transition stream using, respectively, pixel domain 
processing, audio domain processing or other data domain processing. Alternate 
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embodirnents of the invention ensure that non-video data related to image frames forming 
transition stream are included within the transition stream. Multiple and single program 
transport streams splicing operations are supported by the invention. 

Specifically, in a system for processing transport streams including image frames, 2 
method according to the invention for generating a transition stream for transitioning from 
first transport stream to a second transport stream in a substantially seamless manner 
comprises the steps of: decoding a portion of the first transport stream including at least a 
target out- frame representing a last image frame of the first transport stream to be 
presented; decoding a portion of the second transport stream including at least a target in- 
frame representing a first image frame of the second transport stream to be presented; 
processing, using a pixel domain process, at least one of the decoded image frames; and 
encoding a plurality of the decoded image frames, including the target out-frame and the 
target in-frame, to produce the transition stream. 

BRIEF DESCRIPTIO N OF THE DRAWINGS 

The teachings of the present invention can be readily understood by considering the 
following detailed description in conjunction with the accompanying drawings, in which: 

FIG. 1 depicts a high level block diagram of a television studio; 

FIGS. 2A and 2B are graphical representations of a splicing operation useful in 
understanding the invention; 

FIG. 3 depicts an embodiment of a play to air server suitable for use in the television 
studio of FIG. I; 

FIGS. 4A, 4B and 4C are graphical representations of a splicing operation useful in 
understanding the invention; 

FIGS. 5 and 6 depict tabular representations of image frame display order and image 
frame transmission orders useful in understanding the invention; 

FIG. 7 depicts a flow diagram of a method for generating a transition stream or 
transition clip; 

FIG. 8 depicts a flow diagram of a method of determining which information frames 
within a from-stream should be included within the transition stream; 
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FIG 9 depicts a flow diagram of a method for determining which information 
frames within a to-stream should be included within the transition stream; 

FIG. 10 depicts a flow diagram of a method for indexing an information stream; 
HG. 11 depicts a tabular representation of a meta file suitable for use in the play to 

i air server of FIG. 3; and 

FIG. 1 2 depicts a now diagram of a method for generating a transition stream or 
transition clip incorporating pixel domain effects; and 

HO. t3 depicts a flow diagram of a method for generating a transition streamer 
transition dip according to an embodiment of the invention. 
0 To facilitate understanding, identical reference numerals have been used, where 

possible, to designate identical elements that are common to the figures. 

n pTAH ED DF tfRlPTlON 
After considering the following d«cription, those skilled in the art wilt clearly 
raaJize mat the teachings of the invention can be readily utilized in any informant*. 
5 processing system in which a need exists to perform seamless, frame accurate sphctng of. 
eg MPEG-like transport streams including video sub-streams. 

An embodiment of tire invention will be described within the context of a television 
studio environment where a play to air controller causes stored video streams («.g.. vtdeo 
segments or "clips") to be remeved from a server and spliced together in a seamless, frame 
20 accurate manner to produce, eg., an NfPEG-2 compliant video stream suitabie for 

transporting to a far end decoder. However, since the scop, and teachings of * mv^ton 
nave much broader applicability, the invention should no. be construed as betng l.m..ed «o 
the disclosed embodiments. For example, the invention has applicability to server-based 
asset streaming for cable headends, insertion of local con—Is and trailers for dtgtta 
2 5 cinema, frame accurate Internet-based streamm g of MPEG-2 transport streams and hm. ed 
production facilities (i.e., those production facilities performing only the composttton of 
segments for news or other applications). 

Throughout mis description various terms are used to describe the invention. Unless 
m odifiedbyti,efo.lowingd«crip.ion.m.severalofme.ermsaredefinedasfollows : A 

30 spliced stream comprises a su-eam formed by concatenating an exit-stream (or ftom-straam) 
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to an entry-stream (or to-stream) at a particular splicing point. An exit-frame is the last 
frame of an exit-stream. An entry frame is the first frame of an entry-stream. 

FIG. 1 depicts a high level block diagram of a television studio. Specifically, the 
studio of FIG. 1 comprises a play to air server 1 10, a mass storage device 1 15, a play to air 
controller 120, a router 130 and a network interface device (NID) 140. 

The mass storage device 1 15 is used to store a plurality of, illustratively, MPEG-2 
transport streams including encoded video sub-streams and associated audio streams 
providing a program. The mass storage device 115 may also be used other types of 
information streams, such as packetized or non-packetized elementary streams comprising 
video data, audio data, program information and other data. 

The play to air server 1 10 retrieves, via signal path SI, information streams from the 
mass storage device 1 15. The retrieved information streams are processed, in response to a 
control signal produced by the play to air controller 120 (e.g., a play list) to produce an 
output transport stream comprising a plurality of concatenated transport streams. The play 
to air server 1 10 provides the output transport stream and is coupled to the router 130 via 
signal path S2. 

The play to air controller 120 provides control information to the play to air server 
1 10 and other studio equipment (not shown) via a signal path S3, which is coupled to the 
router 130. The router 130 is used to route all control and program information between the 
various functional elements of the television studio 100. For example, control information 
is passed from the play to air controller 120 via signal path S3 to the router 130, which then 
passes the control information to the play to air server 1 10 via signal path S2. Optionally, a 
direct control connection CONTOL between the play to air controller 120 and the play to 
air server 1 10 is used for passing control information. 

The router 130 receives the output transport stream from the play to air server 1 1 0 
via signal path S2 and responsively passes output transport stream to other studio 
components (e.g., editors, off-line storage elements and the like) via signal path S5, or to 
the network interface device 1 40 via signal path S6. 

The network interface device (NID) 140 is used to communicate the output transport 
stream, control information or any other information between the television studio 100 of 
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HG ^do-her studios (no, shown). Optionally, the NID receives inform ado n streams 
Tot other studios, remote camera crews, broadcasters and the >*«. These streams are 

storage in the mass storage device (with or without processmg). 

nc piiy to air server 110 and mas, storage device 1 15 may be implemented using a 

Server" manufactured by SGI of Mountain View. Cal.forma. 

The play to air connote .20 comprises a play .is, 125 corresponding to the 

r_*- m to be scheduled for subsequent incorporanon rato the 
information streams or chps that are to be sched exact frame 

output transport stieam of the play to a.r server 110. The play hst 125 

anZ. locations of each of the information s«ams or clips ma. are to be removed 
ratry and ex.. loc or sp.iced into me outpu. transport 

for each of the information streams or chps 

The play to air server 1 10, in response to a control signal from the play to air 

Z from the mass storage device and splices the clips in a seamless, frame acctuato 
mler according to Ore frame entry and exit information within the control stgmd to 
, ^nl ^.transport sneam. *^~~^~~ZT 
no syntax errors or discontinuities to any other studio component, mc.udmg ^y remote 
"LTprovided by the Network Interface Devices .40. The sphetag or concatenation 

perfled by *. p.ay to air server wi„ be exp.ained in more detai, be.ow wth 
respect to FIG. 2A and FIG. 2B. 

F,G 2A and FIG. 2B are graphica. representations of a splicing operation useful m 

c „ifi,-,iw FIG 2A eraphically depicts a frame accurate, 
understanding the invention. Specifically, FIG. grap y p - ^, mc i ios 
seam.ess sp.icing operation of rwo 30 frames per second MPEG-2 transport stream dps 
reusing a Lition c.ip (230) to produce a resulting spliced 30 frames per second 
MPEG-2 transport stream dip (240). The transition stream 230 is formed 
0 ZZ stream^ and die second stream 220. The resulting spliced s*eam 240 compn es 
Z —ion of portions of the fits, 2,0, transition 230 and second 220 slream, The 
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resulting spliced stream 240 comprises a "knife edge" or frame accurate splice between the 
first and second streams at an out-point (2 10-OUT) of the first stream 21 0 and an in-point 
(220-IN) of the second stream 220. 

FIG. 2B depicts various SMPTE timecodes associated with the streams or clips 
depicted in FIG. 2 A. The first stream or clip 210 (STREAM A) comprises a plurality of 
frames including a first frame 2 1 0-ST beginning at a time to, illustratively at a respective 
SMPTE timecode of 00:00:00:00; a transition out frame 210-TRANS beginning at time t,, 
an out-frame 2 10-OUT ending at a time t 2 , illustratively at a respective SMPTE timecode of 
00:00:02: 1 3; and a last frame 2 10-END starting at a time greater than time t 2 . 

The out-frame 2 10-OUT comprises the last frame of the first stream 210 to be 
displayed (i.e., the frame immediately preceding the desired splice point). The out-frame 
2 1 0-OUT will be included within the transition stream 230. The transition out frame 
210-TRANS comprises the last frame of the first stream 210 to be transmitted. That is, the 
transition stream 230 will be concatenated to the first stream 210 immediately after the 
1 5 transition out frame 2 1 0-TRANS. 

The second stream or clip 220 (STREAM B) comprises a plurality of frames 
including a first frame 220-ST beginning at a respective SMPTE timecode of 00:00:00:00; 
an in-frame 220-IN beginning at time t 2 , illustratively at a respective SMPTE timecode of 
00:00:00:23; a transition in frame 220-TRANS beginning at time t, and a last frame 
20 2 1 0-END ending at a time U, illustratively a respective SMPTE timecode of 00:00:04: 1 7. 

The in-frame 220-IN comprises the first frame of the second stream 220 to be 
displayed (i.e., the frame immediately following the desired splice point). The in-frame 
220-IN will be included within the transition stream 230. The transition in frame 
210-TRANS comprises the first frame of the second stream 220 to be transmitted. That is, 
25 the transition in frame 220-TRANS will be the first frame of the second stream 2 1 0 
concatenated to the transition stream 230. 

The transition stream or clip 230 (STREAM T) is a data structure well adapted to 
providing seamless, frame accurate splicing of video streams. The transition stream or clip 
230 (STREAM T) comprises a plurality of frames including a first frame 230-ST beginning 
30 at a time t, ; and a last frame 230-END ending at time t 3 . The transition clip is comprises 
frames from both the first stream 210 and the second stream 220, including the respective 
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a <fta m CS ThebeginningandcndofthetransitionclipisdepictcdinnG.2^ 
in- and out-frames. Tne oeginning last 
^.ctively. time , »d * mus, be no,* ma, <h.se ^"^Ilm be 
Zes of me tiansition stream wU. be defined accordmg to rnchods <ha, 
described below with respect to FIGS. 8 and 9.- 

The resuming spticed stieam 240 comprises a p.uraliry of frame* • ^ 

00:00:00:00; and a .as, frame 240-END ending a, time U *~£l'£L nrs, dip 

decode of 00:00:04,7. The spUced s,rean, 240 compnses ~ ^ 

2,0 (U„ w trough „> and . .5 frames from me second chp 220 (,«., „ mro gh , 

The soUce stieam 240 depiced in FIG. 2A comprises me f,rst210 and second 220 
The sphce stieam 2 ep ^ ^ ^ a ^ edge 

^220-iN. ^-^rrs^T 

^ manner. regard!ess of m. frame type of me ou, <«„) and m (entiy) frame* 

„ should be noted that under ideal splicing conditions (discussed in the SMPTE 

fi^e- nansition clip ma, may be gen«ra,ed under m. .dea, condttions. 

HG 3 depicts an embodimen, of a play ,0 air server sui,ab,e for use in me — 

^ time base corrector ^ ^ „ ^ ^ like as well as 

drcuitty 310 such as power suppl.es, docKcrc 

circuit ma, assis, in executing me various software routines wnhm *e pUy 
interface between the play .0 air server 1 .0 and me mass smrage dev.ee 
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OptionaJly, the memory 340 includes one or both of an index library 346 and a stream 
library 348. 

To provide a splicing operation such as described above with respect to FIGS. 2A 
and 2B, the invention utilizes the transition clip generation function 344. The transition clip 
generation function 344 generates a transition clip, such that it is possible to exit the first 
stream 2 1 0 at a first prescribed Transport Packet boundary (determined by, e.g., the 
transition stream generator), run the generated transition clip 230, and then enter the second 

„ v » „www.« u j/imwiucu i ransport racicet boundary. The actual exit 

(210-TRANS) and entry (220-TRANS) points to the first 210 and second 220 stream will 
typically not correspond to the actual frames that were requested. Rather, the transition clip 
will be constructed using some number of frames immediately before the splice required 
exit point 2I0-OUT of the first stream 210, and some number of frames immediately after 
the splice required entry point 220-1N of the second stream 220. 

The invention selects frames to be included in the transition stream in a manner that, 
preferably, optimizes the quality of the inter-stream transitions. That is, even thougfa.a 
splicing operation is performed in a frame accurate and seamless manner, it is possible for 
the splicing operation to result in qualitative degradation of video information near the 
splicing points. This is caused by "bit starving" or other coding anomalies resulting from, 
e.g., mismatched video buffering verifier (VBV) levels. The invention adapts the VBV 
20 levels to minimize such anomalies. 

The index generation function 342 will now be described in detail. Two types of 
information are used to build a transition clip, frame data and MPEG data Frame data 
comprises information such as the location, coding type and presentation order of particular 
frames in the from- and to-streams. Frame data is used to determine which frames within 
the from-stream and the to-stream are to be receded to produce the transition clip. MPEG 
data comprises information such as frame dimensions, bit rate, frame versus field formats, 
video buffering verifier (VBV) delay, chrominance sampling formats and the like. MPEG 
data is used to specify the MPEG encoding characteristics of the transport stream. The 
transition clip is preferably encoded or recoded using the same MPEG parameters as the 
30 input TS. 
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streams. The determmed parameters are stor itamelaflle . 
^ ^or, steams processed by *« utdex g filc ^ia.ed 

with a transport stream may oc 
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0 Ore current picture nU mber (to display order); 
2) picture coding type P- <>' B-frame); 

3, the number of Ore transport packet containing me star, of me frame; 

4) the number of the transport packet containing the end of the frame; 

5) the prestation time stamp (PTS) of the frame; 

6) the decode time stamp (DTS) of the frame; 

^.Kenumberofmeu^ponpac.e.containingmestar.ofmese.uencebeader 

the frame; and . 

9) any indicia of the frame comprising an appropriate in frame or on, frame, - 
JL by frante markings according to the SMPTE 3.2M sphemg syntax. 

, ™ MPEG-2 structures such as sequence headers, picture nea 

all fields for common MFfco t 5iruv,iua 

like 

' ^ tbe stream Ubrary 34S <or mass storage device , . 
strea ms that have been processed according to the index generate .unction 
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embodiment of the index generation Amotion 342 will be described below with respect to 
FIG. 10. 

Since parsing a transport stream can be time consuming, one embodiment of the 
invention utilizes pre-indexing. That is, transport streams stored within the mass storage 
device 1 15 or stream library 348 are processed by the index generation function 342 at the 
time of storage or as soon as possible thereafter. In this manner the time required to build 
transition clips is greatly reduced since there is no need to parse transport streams at the 
time of splicing to determine frame and MPEG parameters of the streams. In addition, the 
play to air server 1 10 optionally utilizes the meta files stored within the mass storage device 
1 15 or index library 346 to quickly retrieve characteristics of a transport stream that may be 
needed for scheduling and other functions, such as frame rate. 

FIG. 10 depicts a flow diagram of a method for indexing an information stream. 
Specifically, FIG. 10 depicts a flow diagram of a method 1000 suitable for use in the index 
generation function 342 of the play to air server 110 of FIG. 3. The method 1000 of 
FIG. 10 is suitable for use in implementing step 705 of the method 700 of FIG. 7. 

The method 1000 is entered at step 1005, when an information stream to be indexed 
is received. The method 1000 then proceeds to step 1010. 

At step 1 010 the transport layer of the information stream to be indexed is parsed. 
That is, the header portion of each transport packet within the information stream to be 
parsed is examined to identify a transport packet number (tr), the presence or absence of a 
sequence header within the transport packet, the presence or absence of a picture header 
within the transport packet, the presence or absence of a SMPTE 312M splicing syntax 
indication of a splicing in-frame or a splicing out- frame and other information. The method 
1000 then proceeds to step 1015. 

At step 1015 the first or present frame is examined. That is, the information stream 
to be indexed is parsed down to the packetized elementary stream (PES) layer to examine 
the first video frame of the video elementary stream included within the information stream 
to be indexed. The method 1000 then proceeds to step 1020. 

At step 1020 various parameters associated with the frame examined in step 1015 
are determined. Specifically, referring FIG. 1020-D, step 1020 determines the current 
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picture nun.be, (in disp.ay order,, the picture coding type (I-. P- or B- frame). — <* 

containing the end of the frame and the presentation fmes stamp (PTS) and d 

roTSI of the frame. As previously noted with respee. to step 1010. the transport 

t fr* me such as provided by frame markings according to the SMPTfc. *p » 

S^^— *^--' 1 «»^ te • ,c "' ,, ■ r ■• ,, " 

i0 The method 1000 then proceeds to step 1025. 

The Quantity Bd is a buffer delay as marked in the stream. This is the amount of 
The quantity bo The quantity CBd is the calculated 

time me first bit ofa picture rernainsm the VBV buffer, in q , 

• ;™™«-lv marked the two quantities may differ. The butter aeiay 
15 stream is improperly marked me q»_ _ levclsbetW een210trans and220trans. 

by the invention to determine how to adjust the VBV levels betw 
The VBV level adjustment is done in the transition clip. 

At step 1025 the information regarding the index information is stored in^e.g the 
^sstoragedevicellSortheindex library 346. The method 1000 then proceeds to step 

20 1030. 

A. step .OSOaqueryismadeastowhethermore frames are ,0 be processed. If the 

11 If the query is answered amatively, men the memod 1000 
wnere the next frame is queued, and to step .0.5. where the next queued frame 
25 examined. 

HO . . depicts a ,abu,ar representation of a meta file suitab.e for use in the mdex 
„ bra ry 346 of FIG. 3. Specified, me tab.e . .00 of FK. . 1 comprises a p.urahty o 
JL ( . -54). each record being associated with a respective stanmg transport packet 
, „0. packetized e.emenury stream idenfification fie.d . .20. frame and frame ■ W» 
30 LnUficafion fie.d ,130. PTS fie.d . HO. DTS Hd » 1 50. H. fe,d 1 .50, CB a 1 1 70 and 
marked splice point field 1 180. 



owcrwirv 



WO 00/62552 PCT/US00/I0208 

-13- 

_ In one embodiment of the invention, the index generation function 342 is not used 
prior to receiving and/or splicing transport streams. In this embodiment, frame selection is 
accomplished using a single-pass processing of at least a portion of each transport stream to 
be spliced to determine several parameters related to the from-stream and to-stream. 

For both the from-stream and the to-stream, the following parameters are 
determined: transport packet offsets of the sequencejieader and picturejieader to begin 
decoding, the number of frames to decode; and the number of decoded frames to discard 
(e.g., anchor frame needed to decode frames to be included in the transition clip). 

For the from-stream only, the following parameters are determined: the last transport 
packet to play from the from-stream (i.e., the new exit point or exit frame); and the PTS of 
first frame to display in the transition clip. 

For the to-stream only, the following parameters are determined: the starting and ending 
transport packets for the I-frame to copy to the transition clip; the starting and ending 
transport packets for remaining GOP to copy to the transition clip; the first transport packet 
to play from the to-stream (i.e., the new entry point or entry frame); and the number of 
frames to be copied. 

In addition, since the indexing library retrieves MPEG fields as it parses a transport 
stream, all required recoding parameters are also saved during frame selection. 

The transition clip generation function 344 will now be described in detail. The 
process^of constructing a transition clip comprises the steps of I) determining which frames 
to include in the transition clip; 2) decoding the frame to be included in the transition clip; 
3) encoding or recoding the frames forming the transition clip and 4) transport encoding 
(i.e., packetizing) the transition clip. 

Frame selection affects the size of the output transition clip, the amount of time 
required to generate the transition and places constraints on the encoder in terms of 
optimizing the quality of the recoded video. The frame selection method discussed herein 
resolves the issues of frame dependencies while reducing the frame count and still allowing 
enough transition time to recode the video without significant loss of quality. 

The encoding or recoding step is typically the most time consuming step in the 
transition clip generation function 344, so reducing the number of frames to recode provides 
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.u- vbv level f especially when decreasing it, since nam* 
while adjusting the VBV level (espec y ^ ^ 

— r: =.- in™-— — - 

.ransnussron «~"^-- „,« ^ orae r of. OMo**. » «— 
. fcs.ubu.ar "^-^"r^lJLLd . -end ubuto represent 

; ^ as described above with respect to the first stream 210 of FIG. 2. 

Specifically per the first tabular representation 510, the image frames are delayed 
Specifically, per structure as follows (from frame 1 to 

and encoded according to a group of pictures (GOP) stru 

frame 24): 

,-B-B-P-B-B-P-B-B-I-B-B-P-B-B-P-B-B-I-B-B-P-B-B. 

Addidonaily, per the second .abular represent 520, .he image ftan.es a« 
mnsmitted in the following ftame order 
,^.3.7.5.6-10-5.9.13-1 

» „ assumed, fo, purposes of *e fo„ow,n g discussion. «J, 
vid =o sequence depiced in FIG. 5 a, ftame 1 5, which comprises . «^ 

necessarytodecodeframeiapriortodecodmgframesHandiM 
"frame in the frotn-clip prior to the transition clip will he frame 1 3. That «, 
30 from-clip will be exited immediately before frame 16. 
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FIG. 6 depicts a tabular representation of image frame display order and image 
frame transmission order useful in understanding the invention. Specifically, FIG. 6 depicts 
a first tabular representation 610 depicting the display order of, illustratively, 26 encoded 
image frames forming a portion of a video sequence and a second tabular representation 
620 depicting the transmission order of the 26 image frames forming the video sequence. 
For purposes of this discussion, the video sequence depicted in FIG. 6 comprises a portion 
of a to-stream video sequence (i.e., the second displayed sequence in a spliced sequence), 
such as described above with respect to the second stream 220 of FIG. 2. 

Specifically, per the first tabular representation 6 1 0, the image frames are displayed 
and encoded according to a group of pictures (GOP) structure as follows (from frame 1 to 
frame 26): 

I-B-B-P-B-B-P-B-B-I-B-B-P-B-B-P-B-B-I-B-B-P-B-B-I-B. 

Additionally, per the second tabular representation 520, the image frames are 
transmitted in the following frame order 

1-4-2-3-7-5-6-10-8-9-13-11-12-16-14-15-19-17-18-22-20-21-25-23-24-28. 

It is assumed, for purposes of the following discussion, that it is desired to enter the 
video sequence depicted in FIG. 6 at frame 1 5, which comprises a B-frame. That is, frame 
15 comprises the in-frame of the entry stream depicted in FIG. 6. As will be discussed 
below, frames 10 through 1 8 will be decoded (in display order). It should be noted that the 
first frame to be displayed from the to-stream is frame 25 (an I-frame that is not included in 
the transition clip). 

FIG. 7 depicts a flow diagram of a method for generaring a transition stream or 
transition clip. Specifically, FIG. 7 depicts a flow diagram of a method 700 suitable for use 
in the transition clip generation function 344 of the play to air server 1 10 of FIG. 3. 

The method 700 is entered at step 705, where a "from-stream" and "to-stream" are 
annotated. That is, the information stream providing the information prior to a splice point 
(the from-stream) and the information stream providing information subsequent to the 
splice point (the to-stream) are annotated to identify, on a frame-by-frame basis various 
frame parameters as described above with respect to the index generation function 342. A 
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^odforamtotatin 

^ "TZT^'toL-*™ ^ » the exh frame is d-d- LW* 

of the to-stream to be mspiayeuj oi 

produce a transition clip or transition stream. A transpo 

Ldio information associated wim the from-stfesm and to-stream. ^ 

t « fr nm ?tTeam and the to-stream by, e.g., the play 
used as a transition between the from-stream ana m 

15 server 1 10 of FIGS, land 3. 
A. Frame Selection. 

The firs, step in the process of constructing a transition clip or transition stream 
frame selection process). 

w i, n i„ a from-stream should be included within the ~ 
TO sissutobleforuseinimplcmcntingstepV.OofthemetitodVOOofFiaT. 

25 ' --odSOOisen^ 

identified. The exit frame of the from-stream ,s Ac last ft. ^ 

displayed prior .o a splice point Forexample. refemngnow to thefrom 

Z 5. *e exi, frame (frame .5) comprises a B-frame denoted as frame 513. Th 

800 then proceeds to step 810. 
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At step 8 1 0 the method 800 decodes, in display order, the exit frame and the 
immediately preceding non-anchor frames. That is, referring again to FIG. 5, the exit frame 
(frame 15) and the immediately preceding non-anchor frames (frames 1 1, 12, 13 and 14) are 
decoded. Since frames 1 1, 12 and 13 are predicted using frame 10, it is necessary to also 
decode frame 1 0. However, the decoded frame 10 may be discarded after frames 11-13 
have been decoded. That is, all frames from the I-frame preceding the exit frame in display 
order up to and including the exit frame are decoded. It is necessary to start from the 
I-frame because the I-frame has no frame dependencies (i.e., it can be decoded without first 
decoding any other frames). The method 800 then proceeds to step 8 1 5. 

At step 815a query is made as to whether the exit frame is a B-frame. If the query 
at step 815 is answered negatively, then the method proceeds to step 820. If the query at 
step 815 is answered affirmatively, then the method 800 proceeds to step 825. 

At step 820, since the exit frame is either an I-frame a P-frame, the last from-stream 
frame to be displayed (i.e., the transition frame) prior to the transition stream frames is the 
15 frame immediately preceding, in transmission order, the exit frame. That is, if frame 1 5 of 
the from-stream depicted in FIG. 5 was a P-frame or I-frame rather than B-frame, then the 
last from-stream frame to be displayed would be frame 14. If the exit frame is an I- or 
P-frame, frame dependencies and reordering make it possible to leave the transport 
immediately before the next anchor frame (i.e., after all B-frames that are dependent on the 
20 exit frame). While this reduces the number of frames to recode, it also reduces the 

opportunity to adjust VBV levels for the transition. The method 800 then proceeds to step 
830. 

At step 825 if the exit frame is a B-frame (such as the exit frame in the from-stream 
depicted in FIG. 5), then the last from-stream frame to be displayed is the frame 
25 immediately preceding, in transmission order, the preceding anchor frame. Referring now 
to FIG. 5, the preceding anchor frame with respect to the exit frame is a P-frame (frame 13). 
It should be noted that the last frame to be transmitted of the 24 frame sequence depicted in 
FIG. 5 is the B-frame 12, while the last frame to be displayed is the P-frame 13. The 
method 800 then proceeds to step 830. 

30 At ste P 830 *e decoded frames following, in display order, the last from-stream 

frame (e.g., the B-frame denoted as frame 12 in FIG. 5) are stored in the transition clip. It 
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shoU .dbe noted th* the transition stream orclipwi,. also 

to-sneam. All of Ac frames that are stored within the transiuon chp w,ll then be re encoded 
to form an encoded transition clip or transition stream. 

HG 9 depicts a How diagram of a method for determining which information 
frames within a .-stream should be included within the transition stream. 
m euaod 900 of HG. 9 is suitable for use in implementing step 7.5 of the trans..,on stream 
generation method 700 of FIG. 7. 

The medtod 900 is entered a. step 905. where the entry frame of the to-sneam is 
identified. The entry frame of the to-stream is the firs, frame within the 
dispbyed after a splice point For example, referring now to the to-stream deptcted tn 
nol me entry frame (frame ,5) comprises a B-frame. The method 900 then proceeds to 

step 910. 

At step 91 0 the entry frame and all frames appearing before the next l-frame. to 
display order, are decoded. That is. referring to HG. 6. the entry ^*am« ' 5 > - * 
fame, (i.e.. frames 16. 17 and 18) appearing before me nex, I-ftame (frame 19) are 
decoded. Since frames 17 and 18 in the to-stream video sequence deptcted » FIG. 6 « 
predicted using information from the next I-frame (frame ,9). it is necess*y to *. ta* 
I next I-frame. However, me decoded frame 19 maybe discarded after frames ,7 and .8 
have been decoded. The method 900 men proceeds to step 915. 

A. step 915 the nex. I-frame (e.g.. frame 19 of video sequence 610) is copied to the 
transition clip. That is. the video information within the transport packets fbrmmg the 
clean, (i.e , the video elementary stream information) are extracted from the transport 
packets and copied to the transition clip. I. is noted that the output of the encod^a v.deo 
Lentary stream (VES) such mat the output from the encoder may be cop.ed dtrecdy to 
th e transition clip. The transition clip will be subsequently packetized. The method 900 
then proceeds to step 920. 

A, step 920 the frames (e.g.. frames 20 through 22) between the next I-frame (e.g. 
ftame 25) and the following I-frame (frame 19) are also copied, in transmission order to the 
nation dip. 1. must be noted tha, the frames copied ,„ the transition clip in steps 9, and 
920 (e g., frames 19-21) are copied ,o the transition dip as encoded frames. Thus, me 
method 900 adds ,o the transilion dip decoder frames comprising the entry frame and all 
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firames appearing before the next I-firame, and encoded frames comprising the next I-frame 
and all frames between the next I-frame and the following I-frame. 

The from-stream and to-stream frame selection methods described above with 
respect to FIGS. 8 and 9 allow for frame dependencies between the transition stream frames 
5 and those in one or both of the from-stream and to-stream. The following constraints 

should be observed. The transition clip is encoded as a closed GOP structure. That is, the 
transition clip is a self-contained video clip. The transport stream being exited will not 
reference any frames in the transition clip. If the transport stream being entered is coded 
using an open GOP structure, then it may contain frames that reference frames in the 
10 transition clip. 

An important aspect of the invention is the processing of the transition clip to 
appropriately address frame dependencies of frames that are included within the transition 
clip. A frame dependency comprises, e.g., a predicted frame within the transition clip (i.e., 
a P-frame or B- frame) that must be decoded using an anchor frame from outside of the 
15 transition clip. While it is desirable to create a transition clip in which there are no external 
frame dependencies (i.e., a "self contained'* clip), the invention is capable of producing an 
MPEG compliant transition clip including such frame dependencies. 

B. Decoding. 

20 

The second step in the process of constructing a transition clip or transition stream 
comprises the step of decoding the frames selected in the frame selection process. The 
decoding of the selected frames may be effected using standard hardware or software 
decoding techniques. 

25 It should be noted that, regardless of which frames are to be decoded, decoding must 

begin at an I-frame. As an artifact of the use of prediction in MPEG encoding, every 
non-I-frame is ultimately dependent on the previous I-frame. The above-described frame 
selection methods break these dependencies in order to enable frame accurate, seamless 
splicing between transport streams. 

30 
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C. Encoding. 



The third step in the process of constructing a transition dip or franco- s«am 

,„ e s.eo of encoding tit. decoded frames resulting from the frame selecuon and 
comprises the step of encooing ine _ , . standard 

decoding processes. The encoding of the s«.ec.«d frames may be effected usmg 
hardware or software decoding techniques. 

to addition to breaking frame dependencies (as noted above), one of tin. primary 
objectives when generating a transition dip is to adjust the VBV .eve.s between the 
27Z* andLtrcam such ma, a far-end decoder processing me resuhmg sphcrf 

ft««.th CS DUcc In typical decoders this will result in "freeze frames while 

downstream from the spuce. in iyp«.oi nreuTS when 

„ ^nder waits for data to become available. A much more senous problem occurs when 
the decoder waits wroaww VBV level of the entry 

auTcorrupted dau and typicafiy cause visuai artifacts in me decoded ptctures and can 
even cause a decoder to reset. 

After me se.ec.«d frames have been decoded to baseband, they are receded into a 
VHS Triors used a Samoff CorpomtionDTV^EO-, Software Bncodcrm ensure 
^gh overall performance, picture qua.* and moduiar*,. The rate comrol dgorfthm ^ the 
foder was modified to ai.cw specification of initia, and ending VBV — 
^■module of me encoder was updated to suppott the oumu, fi.e forma, of me decode, 
Z MPEG encoding parameters that were parsed from the transport s*cam dunng fr-e 
Section are passed to the encoder ,. ensure ma. the receded video is compare w,m the 
clips being spliced. 

With respect to rate control (which ummat.ly detemunea overall picture quality of 
U,e receded portion of the transition clip), when adjusting Ore VBV .evel upwards, the 
Led frames are coded using fewer bits than me original stream, Wh.le mcreas ng -he 
,eve. may result in some .oss of quati.y in the resulting output, due to masarng m me 
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human visual system, a small degradation in video quality at a scene change is often 
imperceptible to a viewer. The inventors have determined that such visual degradation 
imparted to a stream including a frame accurate, seamless splice does not result in a 
perceptible level of video degradation. 

In one embodiment of the invention, the from-stream and to-stream each comprise 
transport streams having respective video buffering verifier (VB V). The invention 
determines if a difference exists between the from-stream VBV and the to-stream VBV and 
responsively adapts the re-encoding process to such a difference, as necessary. For 
example, the invention may adapt the re-encoding process by increasing a rate control bit 
allocation in response to a determination that the from-stream VBV exceeds the to-stream 
VBV by a first threshold level, and by decreasing the rate control bit allocation in response 
to a determination that the to-stream VBV exceeds the from-stream VBV by a second 
threshold level. 



D. Packetizing. 

The fourth step in the process of constructing a transition clip or transition stream 
comprises the step of encoding the decoded frames resulting from the frame selection and 
decoding processes 

After recoding the selected frames, the I-frame and remaining GOP that were copied 
from the to-stream are appended to the recoded VES. Pending restamping of 
temporal_reference fields, the resulting transition clip comprises a syntactically complete 
MPEG-2 stream (except that is does not have a sequence_end_code) and contains all frames 
in the transition. The final step is to packetize the VES into a transport stream. 

The first step in packetizing the transition stream is to parse the transition stream to 
locate the offsets of the start of each frame (either a sequencejieader or a picturejieader) 
and the types of frames within the transition stream. Once this data is available, the 
dependencies between frames are calculated and the frame display order is determined. It 
should be noted that the temporal_reference fields are unsuitable for this purpose since they 
are presently invalid due to GOP restructuring. Once the display order has been 
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d .ermined the temporal refers fields are re-sfcmped and the presentation (PTS) and 
determined, me iemp _ transition stream. 

within a 2**. -* allowed tempera. discontinuities within a 

compliant with the MPEG-2 stanaaru, desira ble to remove such 

.pri discontinuities within a transport stream by the use 

„ Me p cc headers are generated and tne 

frames are output into a PES stream. Th ^pon packets are generated to 

Ma the PES packets. ^ stamped with Ure FB of the video stream 

si2 e increase. The packets m the resulting is ruining a single VES. 

^emg spliced. The final output of tine packetizing process ts aTScontam, 
Th«!ln does no, conrain any program specific informatton (PSD. 

E. Remultiplexing. 

# 

„ ^ncitinn clio or transition stream 
^ * ict^ in the orocess of constructing a transition cup or u 

specific information (PSD from the original program stream. 

.ranspor, packets) a single instance of the program assocatton ^ 
mapl.e(PM^.m*ecaWsp.iW^ 

* „„e PMT. In me case of splicing multiple ^^^L it is necessary 
triple PMTs. Optionally, to fiilly implement the ATSC bmadcas. form 
,„ extract other utiles as we,, (as known to those skilled tn tire ar,). 

th* PAT and the PMT(s), the number of packets in the transition 
After extracting the PAT and the I'M l transition clip 

cli p is calculated based on the multiplex bi, rate, the number of framesm ,h 
J tine frame rate. For example, the ATSC specification rcutres a PAT a, leas, ery 
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100ms and a PMT at least every 400ms. The number of packets between PAT and PMT 
tables is determined from the multiplex bit rate. 

After calculating the number of packets in the transition clip, a blank transition clip 
composed of null transport packets is created and the PAT and PMT tables are inserted at 
the calculated spacings (e.g., PAT every lOOmS and PMT every 400mS). 

After appropriately inserting the PAT and PMT(s) in the blank transition stream, the 
video transport stream is inserted into the blank transition stream by spacing packets within 
the remaining available packets, thereby forming an output transport stream. 

It should be noted that when inserting the PAT, PMT and video packets into the 
empty transition clip, each packet should be restamped with a new continuity-counter. The 
starting value of the continuity_counter is determined separately for each PID from the 
exit-stream or from-stream. If the video clip is too large, then there won't be enough 
transport packets in the transition clip, since the size of the transition clip is calculated with 
respect to the expected clip duration. This calculation takes into account the frame count, 
frame rate, VBV delays, multiplex bit rate etc. It is important that VBV adjustment is 
performed properly by the encoder. 

The completed transition clip is then inserted between the spliced transport streams 
at the calculated transport packet offsets, thereby executing a seamless splice. 

The above-described invention advantageously provides for seamless, frame 
accurate splicing or concatenation of transport streams using transition streams of clips, 
thereby avoiding the construction of an entirely hew transport stream. The from-and 
to-streams are not modified during the process, since they are only used to provide 
information sufficient to produce the transition stream. The transition stream, after being 
used to effect a change between streams, may be discarded by the system or saved for future 
use. 

The invention has primarily been described within the context of generating a 
transition stream comprising video information suitable for use in providing a seamless 
splice of, illustratively, an MPEG-2 transport stream including a video stream or sub- 
stream. It will be appreciated by those skilled in the art that other forms of information are 
often associated with such video streams. For example, many video streams are associated 
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•* corresponding audio stteams. In addition, other forms of information such * da* 
Z l^L may be incorporated into an information stream inching vrdeo 
essence and meta-data may o v »„H^t «f the video and/or audio 

information Data essence is data that has a context independent of the 

w ' stream Examples of data essence comprise stock quotations, weather 
data within a stream, txampies 

rfvisoriea and other news, messages or control information no. related to the vtdeo 
audio data and the like. 

Meta-data is data relating to other data such as data describing 

x «ream Examples of meta-data inciude video or internet data broadcast 

actors in a movie, title of a presentation and the hke. 

ta ,he case of audio information, data essence and/or meta-data associated wuh 
■ , ^^within a video information stream, it is desirable to ensure ma, a,. 

rush.mecaseof.spUcingappUcationwhemoneornmmv.deos^ 

^ition cup enabling me spUc. be included within that mansion chp. 

HO 4A comprises a graphical representation of a splicing operation 

as stream A; a to-stream *»zu ucu 

, >^ c ctr.am T It should be noted that each of streams A (410), B (4Z0), ana 

r — ir^- — * — ■ — — — ? -r* 
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data essence and/or audio data associated with the exit or entry frame are likely to be lost or 
incompletely provided to a transition stream. 

Stream A (410) is bounded by a start video frame 41 0-ST and an ending video frame 
410-END. Stream A comprises a from-stream that will be exited at an exit video frame 
5 41 0-OUT. Thus, as discussed above with respect to the transition stream generation 

methods, the plurality of information frames beginning with a transitional video frame 410- 
TRAN and ending with an exit video frame 41 0-OUT will be decoded for use in forming 
the transition stream. However, the exit video frame 41 0-OUT is associated with meta-data 
410-MD, data essence 410-DE and audio data 410-AD that is located within stream A after 
10 the exit video frame 4 1 0-OUT. It should be noted that such data may also be located before 
the exit video frame 41 0-OUT. Thus, to incorporate this non- video data into the transition 
stream it is necessary to extract or decode the non-video data. Referring to stream A (410), 
the non- video data associated with the exit frame 41 0-OUT is bounded by the transition 
frame 410-TRAN and an extent frame 410-EXT defining the maximal boundary (or extent) 
1 5 likely to be associated with the non- video data. 

Stream B (420) is bounded by a start video frame 420-ST and an ending video frame 
420-END. Stream B comprises a to-stream that will be entered at entry video frame 420- 
IN. Thus, as discussed above with respect to the transition stream generation methods, the 
plurality of information frames beginning with the entry frame 420-IN and ending with a 
20 transitional video frame 420-TRAN will be decoded for use in forming the transition stream 
430. However, the entry video frame 420-IN is associated with meta data 420-ND, data 
essence 420-DE and audio data 420-AD that is located within stream B before the entry 
video frame 420-IN. It should be noted that such data may also be located after the entry 
video frame 420-IN. Thus, to incorporate this non-video data into the transition stream 430 
15 it is necessary to extract or decode the non-video data. Referring to stream B (420), the 

non-video data associated with the entry frame 420-IN is bounded by an extent frame 420- 
EXT and the transition frame 420-TRAN. The extent frame 420-EXT defines the maximal 
boundary (or extent) likely to be associated with the non- video data preceding in bit stream 
order the entry frame 420-IN. 

0 Thus, to capture all of the video frames appropriate to the transition stream and all 

of the non-video data associated with those video frames the deconstructed portion of 



.VSOOCIO <WO 00e2£52A2_l_> 



PCTAJS00/10208 

WO 00/62551 

-26- 

,a „v 410 TRAN and 410-EXT. Similarly, the deconstructed portion of 
^Aisb.und«dby410-TRANand4i After decoding and/or - 

stream B is bounded by 420-EXf and 420-EXf and «MMH- A ^ b _ ^ 

coring the video data, meta data, data essence and «*» -~ fro ^ 
tuition stream 430 is formed in a manner mc,ud» *su h £*Th • ^ 
43 0 is bounded by a star, frame 430-ST and an end at th^appropriare exit frame 4,0- 
deflning a frame accuse spHce ben.ee. the - ^ *^ and*, audi, dara 
OUT and entry frame 420-IN. Additional*. Ore m.«a " a ^ ^.uded 

vide, data packed in a manner preserving Ore assoctatton between th 

data packets. . 
n G 4B comprises a graphic* representation of a spHcing operauon usefcUn 
FIG.4BC. p .... „„ 4B comprises a first mum-program 

undersranmngme invent.... Spec.fica.ly. FIG. 4Bc p ^ ^ 

!Z.gb.«w.en suchmutti program transpor. streams m a manner preservmg 
"1 between non-video data and *e vtdeo data associated W.U. *. 

„ muHipiex A 440 comprises tbree uan^ort ^^^rises 

(453). For purposes of this discussion program 1 441 and 

- AvfT tv a at the sub stream level. That is, program 
concatenated to transport MUX A at the suo ort , ub stream within a transition 

a m i will be concatenated to form a first transport sub stream wi 
program A 45 1 will oe concave . •,, be ex ited at an 

• • „ a nluralitv of sub streams. Specifically, program 1 will oe ex 
stream comprising a plurality or suo Similarly, 

a™ <mi OUT while program A will be entered at an IN frame 4M u 
out frame 441 -OUT while p gnu will be ente red at an IN 

program 2 will be exited at an out frame ^^/^ ^ program C will 
frame 452-1N; program 3 will be exited at an OUT frame 4^3 

,i;~*t»H in FIG 4B and described above, 
splice points as indicated in r«j. w « 
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In addition to video frames, each of the transport sub streams includes non-video 
data such as meta data, data essence and audio data. As indicated in FIG. 4B, each of the 
splice points and the video frames included within the transition stream is associated with 
an extent of such non-video data. Thus, each of the transport MUX sub streams will be 
decoded or otherwise processed to accommodate the extraction of all necessary video and 
non-video data to effect individual transition sub streams. The individual transition sub 
streams are then incorporated into a multi-program transition stream for subsequently 
concatenating the first multi-program stream A (440) and the second multi-program stream 
B (450). 

FIG. 4C depicts a graphical representation of a splicing operation useful in 
understanding the invention. Specifically, FIG. 4C depicts a reservation of non-video 
packet place holders within a transition stream under construction 460. That is, while 
forming a transition stream, it is likely that the step of encoding the decoded video frames 
from the frames being spliced is performed prior to the step of inserting non- video data into 
the partially formed transition stream. To ensure that the non-video data within the 
transition stream may be located proximate to the video data with which it is associated, 
placeholders are established during the video encoding process to allow for subsequent 
insertion of the non-video data within the transition stream. Specifically, as indicated in 
FIG. 4C, a plurality of audio, data essence and/or meta data place holders are inserted 
within a transition stream under construction. Upon completion of the transition stream, 
those place holders not utilized to store such non-video data are deleted and the resulting 
completed transition stream 460 / is utilized as the transition stream. 

Within the context of a rnulti program transport stream such as described above with 
respect to FIG. 4B, each of the transport sub streams being formed during the transition 
stream generation process utilizes a respective set of non-video data place holders. Each 
stream, upon completion, deletes or otherwise "de-utilizes" or releases the unused place 
holders (e.g., inserting NULL data) to form a completed transition stream. 

The resulting transition stream or transition clip 430 comprises video information 
and non- video information from each of the streams A and B. 

FIG. 12 depicts a flow diagram of a method for generating a transition stream or 
transition clip incorporating pixel domain effects. Specifically, FIG. 12 depicts a flow 
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diagram of a method .200 suiub.e for us. in te transition dip generation function 344 of 
the Play to air server 1 10 of FIG. 3. 

The method .200 is entered a, step 1210, where a "from-stream" and "to-sfream" 
m rotated. A method for routing an information stream has previously been 
described wid, respeet to FIG. .0. As previously noted, such annoUUon 1— 
necessary to practice the invention. However, .he process of annotanng. the s*eam. « 
useful in afftciendy processing .he steams in subsequent processing steps orby ofher 
processing opponent The method 1200 men proceeds .o s.ep 1220. 

At step .220 a portion of One from-stream prior .o me exi, .fame is decoded, such as 
desert above wim respec. .o s.ep 7,0 of me method 700 of FIG. 7. The mefhod .200 
then proceeds to step 1230. 

At step 1230 a portion of <h= .o-stream beginning win, me entry frame is decoded 
such as desled above wHh respec, «o step 7.5 of me medtod 700 of FIG. 7. The memod 
1 200 then proceeds to step 1240. 

A. step .240 m. decoded portions of the from-s«am and te-stream are subjected to 
one or more pixe, domain processing steps to provide, for example, a special effect or «her 
pn.cess.ng effect The specia. effect provided a, step >™^^ 
L specia. effec* noted in box .240; namely, morphing. fade, w,pe. d,ssoWe. push, revea.. 
b.aclt-frame. freeze-frame or other weH-known pixel domain processing effects. A 
m on»hing effect comprises , gradual (e.g.. frame by frame) change from one shape m» 
anoLr. A wipe effect comprises a changing from one image ,o another ,m,g« ™- 
hnage regional change such as changing me .ocation of a vertica. bar delmeanrtg *e firs. 
aJLnd images from, for exampte, left to right or .op to bottom. A fcde or d,s».ve 
effec, comprises a gradua, fading or disserving of a firs, image to revea, an second ,n^e 
driving me firs, image. The underiying image may fade may a.so emerge m an mamw 
opposite ,o ,he fading firs, image. A b,acx (or b.ue) frame effec, compnses fhe mserho of 
a monochrome frame(s, between rwo image, A "push" effec, is in effec, wheretn * old 
in ,age appears to s.ide off me screen as if i, were being pushed by a new tmage sl.dmg onto 
Ut. screen. The o.d image and new image may be sHd in any direction ,o produce dus 
effec, A "reveal" effec, is where an old image is removed «, reveal an underlyrng new 
taage. A revea. effec, may comprise a "pea. back" effect in which a "turned up comer, or 
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a graphical representation of a turned up comer, reveals a portion of a new image 
underlying the old image. Upon selection of the new image, the old image is pealed back c 
otherwise removed from view beginning with the turned up comer portion to reveal the 
underlying new image. 

A non-pixel domain effect for the meta-data domain may comprise a closed caption 
change at a sentence boundary. A non-pixel domain effect for the audio domain may 
comprise an audio fade from stream A audio, through silence, and back to audio 
information associated with stream B to form the spliced information stream. 

The pixel domain processing step(s) may be used to provide artistic or interesting 
means of transitioning between video clips. For example, a caveat effect may be 
implemented in a 6 frame transition clip by transitioning from frame one to frame six via 
the four intervening frames including portions of frames one and six. While it is desirable 
to ensure that the pixel domain processing in part some form of transitional information to a 
viewer, such imparting of transitional information is not necessary. The method 1200 then 
proceeds to step 1250. 

In one embodiment of the invention, the pixel domain process is performed with 
respect to a plurality of transport streams or other streams. Specifically, it is noted that the 
invention has been described above primarily within the context of two transport streams 
including at least image information being concatenated to produce a spliced transport 
stream including at least image information. During the generation of the transition stream 
or transition clip, the image information within the respective transport streams is decoded 
such that pixel domain information is available for processing by a pixel domain process. 
In one embodiment of the invention, additional pixel domain (or non-pixel domain) 
information is used during the pixel domain or non-pixel domain processing step. In a 
chroma-key processing example, a transport stream including a chroma-keying signal, 
herein denoted as a K-stream, includes video information having one or more chroma-keyed 
image regions. A first keyed image region within the K-stream may be indicated by a first 
color, while a second keyed image region of the K-stream may be indicated by a second 
color. The pixel domain information within the transition clip associated with the first 
keyed region is replaced by information from a first information source or information 
stream, while the pixel domain information within the transition clip associated with the 



PCTAJS00/10?08 

WO 00/62552 

-30- 

second keyed region is rep.aced by information from a second mfonnation « or 

Th^inthecaseofstreamAcomoristagaK-streamhavrng 

JeusedCdenoted as region stream one and region ^^'^l^ « 
information ,o replace the firs, and second keyed regions. -^-* rf "~ 
„«, be appreciated by those skilled in ,he ar, tha. any mnnber of regmns may be 
and *at non-pixel information may also be divided into regtons. 

A, step 1 250, the decoded and processed video frames are re-encoded .o form a 
^on JL. Step H50 may be implemented in snbstanti.ly same manner as 
described above with respect to step 720 of tire mcmod 700 of FIG. 7. 

Thus the method 1200 of FIG. 12 provides, in addition to the generation of a 
Thus, the «*•»« infonM tion within that transmon 

transition stream or transition clip, tne aaapiauwu u» 

transition stream ^ pur pose. In this manner, well- 

stream or transition clip to an artistic or interesting visual P 

• 1 Homain orocessing techniques may be used to impart a more realistic 
taown pixel domain processing q to - S tream is entered, 

transitional impression to a viewer as the from-stream is exuc 

be nl processing in non-video domains may aiso be perked on the non- 
video date discussed above with respect to FIG. 4A-IC. 

Thus, the utility of the P— mention extends beyond *e bare notion of >bt«I or 
ta age domah, processing of only two image streams. Rather, the subject mvennon find. 
7Z applicabuiry where a plurality of information streams may be used to process P«e. 
d nrain Ir other or non-video domain information within a transinon sheam bemg 
L ra ,ed. In mis manner, a .ranaition stream or transition cUp maybe generated m 
'Znse to many sources of information such that video and non-video mformahon ts 
ZTXidL and/or non-video information from mote titan me two areams form.ng a 

transition clip. 

,. should be noted that a transition clip or stream may be fotmed with a 
predetermined number of video frame, As such, in addition to the P"™»^^ 
VBV processing opportunities, me predetermined number of frames may be used to eftec a 
^cuurpixe, llain elfect by se.ective encoding of pottions of frames. ~— ^- 

transition dip to have five video frame, each of the five frames may be 
intra-fram. regions. The fits, frame includes ./« video date from tit. to-stream and 5/6 
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from the from-stream; the second frame includes 2/6 data from the from-stream and 4/6 
data from the to- stream and so on up to the fifth frame, which includes 1/6 data from the 
from-stream and 5/6 data from the to-stream. The inventors have determined that providing 
user-selectable (or predetermined) numbers of frames between 3 and 25 frames in a 
5 transition stream provides sufficient flexibility to enable most pixel domain processes and 
VBV buffer normalization functions, 

FIG. 13 depicts a flow diagram of a method for generating a transition stream or 
transition clip according to an embodiment of the invention. Specifically, FIG. 13 depicts a 
flow diagram of a method 1300 suitable for use in the transition clip generation function 
10 344 of the Play to air server 1 10 of FIG. 3. 

The method 1300 is entered at step 1310 where an appropriate portion of the from- 
stream video prior to an exit frame is decoded. The method 1300 then proceeds to step 
1320. 

At step 1320, non- video information such as data essence, audio, meta-data and/or 
15 other data within the from-stream that is associated with the decoded video portion is 

extracted or decoded. That is, auxiliary or ancillary data, such as the aforementioned non- 
video data types, that are associated with the video frames within the from-stream decoded 
at step 1310 are extracted or decoded for subsequent use in the transition stream or 
transition clip. 

20 At step 1330, an appropriate portion of the to-stream video beginning with an entry 

frame is decoded. The method 1300 then proceeds to step 1320. 

At step 1340, non-video data associated with the video frames decoded at step 1330 
is extracted or decoded. That is, data essence, audio, meta-data, and/or other data within the 
to-stream associated with the video frames decoded at step 1330 is extracted or decoded for 
25 subsequent use in the transition stream or transition clip. The method 1300 then proceeds to 
optional step 1350. 

Step 1 350, an optional processing step suitable for use on a partially formed 
transition stream or transition clip. Specifically, optional step 1350 includes three optional 
sub-steps which may be utilized independently or in any combination to effect a processing 
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decoded at steps 1320 and 1340. 

A „„, optiona, sub step .352 within optiona, step .350 comprises^ p^nce 

^^-^ U hta-llil(1- 
FK3 12 may be used to ptocess the to-stieam and from-streamv.de 
I, «eps,3.0and ,330 respective*. The memod ,300 men proceeds to step .354. 

At ^oad optiona, sub-step ,354 or step .350. any audio domain ?™<°™*1 
extracted or decoded audio data from steps .320 and/or .340 is performed. Such au*o 
extracted , ,,. „ of ^ known audio domain processing techniques used to 

method 1300 then proceeds to step 1356. 

At a™, opdona, sub-step .356 of step ,350 any data domain processing of 
extract or decoded data essence, me^ata or omer data that was exfracted or decoded a, 

Ztments to meta-data or data essence based upon the pixe, domam Qf 
penned a, step ,352. For example, if me meu dau descnbea ^jT 
\ virion enp video frame subjected to pixe, domam processmg^en *e m^ 
pressed to reflect the corresponding pixe, domain processus Omer data pmc^mg 
Tenons may be imp.emented as weU. The method ,300 men proceeds to step .360. 

A , s ,ep ,360 the decoded and. optionafly. processed video portions of the trans-bon 

rae .a-da.a. and/or omer data, ine.uding no»-video data processed at step 1352 .356 

ma, is the optionafly processed video and non-video inf„rma«,on produced by steps 3 
Z is re-encoded or reinserted into a fransport stream fonna, to fotm a trans.fon dtp 
transition stream. 

to an embodiment of the invention described above with respect to HGjlC the 

a ♦„ r^r^efnt the video and non-video data, in mis emu 
v nlurality of packets are used to represent tne viaco <u 

of 1 invenL, prior to fotming a transition stream or transition Cip, some poriton of 
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available packets utilized to hold infonnation are reserved for non-video data purposes. In 
this manner, the video information may be processed prior to the processing of any non- 
video information such that data place holders proximate the video frames may be 
interspersed among the video frames to include data relevant to those proximate video 
5 frames. Thus, in this embodiment of the invention an optional step 1350 is used to prior to 
step 1310 and the method 1300 of FIG. 13. Specifically, at step 1350 data place holders axe 
mcluded in the transition stream to be formed. That is, at step 1350 a portion of memory or 
plurality of packets intended to be used for the transition stream are interspersed with place 
holder information defining packets for non-video use. The method 1300 then proceeds 
10 through step 1310 to step 1360. 

Step 1360, per box 1365, utilizes the appropriate place holders to store non-video 
infonnation such as optionally processed audio, meta-data, data essence and/or other data 
related to the video frames. Upon completing the transition clip or upon processing all non- 
video information and locating such processed non-video information within appropriate 
place holders, unused place holders are removed or otherwise utilized for other purposes. 

As previously noted, additional processing of the transition clip is used to ensure 
that the VBV of the from-and to-streams are accommodated in a manner providing for a 
substantially seamless splicing operation. 

The invention has been primarily described within the context of splicing or 
concatenating two single program transport streams, i.e., transport streams containing a 
single audio-visual program, such as a movie, television show or commercial. However 
those skilled in the art will appreciate that the invention provides frame accurate, seamless 
splicing between multi-program transport streams as well. To effect such a splice, the 
above-described methods are adapted to determine out-frames, in-frames and other 
appropriate parameters for each program within the multi-program transport streams. 

Although various embodiments which incorporate the teachings of the present 
invention have been shown and described in detail herein, those skilled in the art can readily 
dev, se many other varied embodiments that still incorporate these teachings. 
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What is claimed is: 

coding ,m0, a portion of said *« -sport stream inCudin. aU^« 
out-frame rcpr.sen.ing a last image frame of said firs, transport sneam to be presented. 

decoding (1230) a portion of said seeond transport stream including at*as, a garget 

processing (.240), using a pixe. domain process (1245). a, .east one of sa.d decoded 
image frames; and 

encoding (, 250) a plural* of said decoded image frames, including said targe, on, 
fi^e and said targe, in-frame. to produce said transition stream. 

2 ^^odofclahn , . wherein said pixe, domain process comprises a, leas, one of 
2. inemcuiu freeze-frarne and chroma-keying 
a morph. fade, wipe, dissolve, push, reveal, black-frame. freeze Sam 

pixel domain process. 

3 The method of claim 1, further comprising the steps of: 

extracting (.220. .340). fromsaid first and second transport stress, non-video data 
associated with said video frames used «o form said transition stream; and 

inserting (.300). into said transition stream, said extracted non-video data. 

4 The method of Cairn 3. wherein said non-video data comprises a, leas, one of audio 
data, meta-dara, data essence, ancillary data and auxiliary data. 



The 



method of claim 3, further comprising the step of: 
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processing (1350), using an non-video domain process, at least a portion of said 
extracted non- video data. 

6. The method of claim 4, wherein said step of encoding said plurality of decoded 
image includes the step of transport encoded said encoded plurality of image frames, said 
method further comprising the steps of: 

reserving (1 3 1 5) a plurality of transport packets within said transition stream, said 
reserved packets not being utilized to store encoded image information; 

utilizing (1365) at least a portion of said reserved plurality of transport packets to 
store said extracted non- video data. 

7. The method of claim 3, wherein said first transport stream and said second transport 
stream are multiplexed into respective first and second multiple program transport streams, 
said method further comprising the step of: 

determining, for each multiple program transport stream including a transport stream 
to be processed, a maximum extent of all image frames to be included in a transition stream; 
and 

demultiplexing each multiple program transport stream to accommodate its 
respective determined maximum extent 

8. The method of claim 7, wherein said step of determining said image data extent 
includes the step of determining a maximum extent of all non-video data associated with 
image frames to be included in a transition stream, said maximum extent comprising a 
combination of the image data extent and the non- video data extent. 



9. The method of claim 1 , further comprising the step of indexing each of said first and 
second transport streams, said step of indexing comprising the steps of: 
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parsing (.0.0) a transport layer of a stream .. be indexed 
parsings headers and predefined sphemg 

associated with at least one of sequence headers, picture headers an p 

syntax; r 

frame transport packet number, a presentation rnne stamp (PTS) and 
(DTS). 

10 The merncd of Cairn , . wherein said nrrm-stream and sa* I to-stieam eompna e 
ILspor, ..earn havmg associated with it a respective video buffenng venfter (VBV) 
parameter, said method farther comprising the step of: 

determining if a difference exists between said front-stream VBV parameter and 
said to-stream VBV parameter, and 

adapting, in response to said determination, sum of re-encoding. 

, , the method of Cairn .0. wherein said »ep of adapting comprises me steps of: 

increasing a rat. comrol bit location in response .0 a determination that said from- 
vZlramerer exceeds said to-stream VBV parameter by a frrs, threshed ,eve,. 



stream 
and 



decreasing said rate contro, bit a,.ocation in response to a dererminati on ^ - 
afream VBV parameter exceeds said VBV parameter by a second threshold 



level. 
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